Genome-Wide Comparison and Functional Characterization of HMGR Gene Family Associated with Shikonin Biosynthesis in Lithospermum erythrorhizon

3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), as the rate-limiting enzyme in the mevalonate pathway, is essential for the biosynthesis of shikonin in Lithospermum erythrorhizon. However, in the absence of sufficient data, the principles of a genome-wide in-depth evolutionary exploration of HMGR family members in plants, as well as key members related to shikonin biosynthesis, remain unidentified. In this study, 124 HMGRs were identified and characterized from 36 representative plants, including L. erythrorhizon. Vascular plants were found to have more HMGR family genes than nonvascular plants. The phylogenetic tree revealed that during lineage and species diversification, the HMGRs evolved independently and intronless LerHMGRs emerged from multi-intron HMGR in land plants. Among them, Pinus tabuliformis and L. erythrorhizon had the most HMGR gene duplications, with 11 LerHMGRs most likely expanded through WGD/segmental and tandem duplications. In seedling roots and M9 cultured cells/hairy roots, where shikonin biosynthesis occurs, LerHMGR1 and LerHMGR2 were expressed significantly more than other genes. The enzymatic activities of LerHMGR1 and LerHMGR2 further supported their roles in catalyzing the conversion of HMG-CoA to mevalonate. Our findings provide insight into the molecular evolutionary properties and function of the HMGR family in plants and a basis for the genetic improvement of efficiently produced secondary metabolites in L. erythrorhizon.


Introduction
Plant-derived medicinal natural products are important sources of medicines for treating a wide range of diseases. Lithospermum erythrorhizon Sieb. et Zucc, an important medicinal plant in East Asian and Western traditional medicine, can biosynthesize red naphthoquinone compounds, namely shikonin and its derivatives, in its root periderm [1]. These compounds have been shown to have anti-HIV [2], antioxidant [3], anti-inflammatory [4,5], wound healing [6], and other pharmacological activities [7][8][9], as well as to trigger cancer cell apoptosis, making them yet another promising natural anti-tumor drug [10,11]. Furthermore, these metabolites are widely used natural raw materials for making cosmetics and dyes that have a high market value.
However, due to the difficulties of cultivating L. erythrorhizon and the low amount of root periderm, plant-based production is seriously unable to meet the global demand.

Evolution and Characterization of the HMGR Gene Family in Plants
To explore the evolution of the HMGR gene family in plants, we constructed a phylogenetic tree using all 124 HMGR protein sequences from 29 plants. The results showed that the taxonomic position of HMGRs in the seven lineages was consistent with the plant's evolutionary order, indicating that plant HMGRs developed into different branches after the lineages diverged ( Figure 1A). LerHMGRs roughly divided into three portions in eudicots lineages, all of which closely related to the Lamiids plant's HMGRs. LerHMGR5~LerHMGR7 was closely related to 10 members (SlyHMGR3/4/2/1, StuH-MGR1/5/2/3/6, CcaHMGR1) from S. lycopersicum, S. tuberosum, and C. canephora; the six LerHMGRs (LerHMGR1~LerHMGR4, LerHMGR8, LerHMGR9) were further classified into two broad categories that are more closely related to Lamiales SinHMGR2, OueHMGR5, and OueHMGR1; the closest member of LerHMGR10 and LerHMGR11 were seven members (OeuHMGR4/2/8/7, SinHMGR1, CcaHMGR2, StuHMGR4).
Introns are closely related to gene expression, transcription shearing, and other processes, and they may play a key role in plant adaptability and evolution. Gene structure analysis using the GSDS tool predicted that the number of introns in all HMGRs ranged from 0 to 6, with 69.35% having three introns ( Figure 1B, Table S2). LerHMGR1-LerHMGR4, LerHMGR8, LerHMGR9, and CbrHMGR are intronless genes ( Figure 1B), implying that intronless CbrHMGR has undergone a unique evolution process and that intronless LerHM-GRs emerged from multi-intron HMGR in land plants, and these six LerHMGR genes may be more involved in the quick response to stress than other members of the gene family [38]. In addition, we identified an intron in the 5 UTR region of AspHMGR2, AspHMGR3, AspHMGR7, and ZmaHMGR4. These 5 UTR introns may increase the promoter activity or translation efficiency of the corresponding genes [39,40] (Figure 1B, Table S2).
Physicochemical properties and the identification of 10 motifs in 124 plant HMGRs showed that most HMGRs were conserved throughout the evolution of lower plants into higher plants ( Figure 1C and Figure S1, Table S2). In the catalytic domains, most of them have three conserved motifs: motif 3 represents the two HMG-CoA binding sites (EMPV-GYVQIP and TTEGCLVA) and motif 4 and motif 5 represent the two NADP(H) binding sites (DAMGMNM and GTVGGGT) ( Figure 1D). Among them, the second NADP(H) binding site was detected in all 12 HMGRs with motif 5 deletion, which may be due to the low conserved flanking sequence of this binding site ( Figure 1C). In addition, motif 8 or motif 9, which contain the transmembrane helix typically possessed by plant HMGRs, were not found in the N-terminal region of 15 HMGRs, including one branch of monocots ( Figure 1C and Figure S1). Among them, the absence of motif 8 and motif 9 does not correspond to TMHMM Server v.2.0 s prediction due to the presence of one transmembrane helix in MviHMGR2, PtaHMGR6, PtaHMGR13, and SbiHMGR2. This may be due to the diversity of the sequences.

Gene Expansion of HMGR Gene Family in L. erythrorhizon
Gene duplications and loss play a significant factor in plant evolution and plant fitness [41]. To further understand the evolution of the HMGR gene family, we performed gene duplication and loss analysis in 19 representative plants from seven lineages using Notung software by reconciling gene phylogenetic trees and species taxonomy common trees ( Figure 2A). Based on the number of variations found in HMGR family genes at different evolutionary stages, HMGR ancestral genes were duplicated prior the emergence of Mesostigma viride, but none were lost (Figure 2A). No genes were duplicated or lost in the lineage of the common ancestor of two Gymnospermae; however, 16 genes were duplicated in Pinus tabuliformis after its divergence from Cycas panzhihuaensis (Figure 2A). Similarly, no genes were lost or duplicated in the lineage of the common ancestor of the four monocots and the six eudicots; but, three and seven genes, respectively, were duplicated after the divergence of the ancestors of the monocots and the eudicots (Figure 2A). Moreover, there was a wide range of HMGR genes that were duplicated and lost during the evolution of the six eudicots plants. Among the eudicots plants, the HMGR genes of L. erythrorhizon experienced the most duplications ( Figure 2A). the lineage of the common ancestor of two Gymnospermae; however, 16 genes were duplicated in Pinus tabuliformis after its divergence from Cycas panzhihuaensis (Figure 2A). Similarly, no genes were lost or duplicated in the lineage of the common ancestor of the four monocots and the six eudicots; but, three and seven genes, respectively, were duplicated after the divergence of the ancestors of the monocots and the eudicots ( Figure 2A). Moreover, there was a wide range of HMGR genes that were duplicated and lost during the evolution of the six eudicots plants. Among the eudicots plants, the HMGR genes of L. erythrorhizon experienced the most duplications ( Figure 2A). To elucidate the expansion mechanisms of the LerHMGR gene family in L. erythrorhizon, we analyzed the genomic location and duplication types of each LerHMGR. Then the synthetic blocks were identified in the entire L. erythrorhizon genome, as well as between it and the O. europaea and S. indicum genomes. We also calculated the median synonymous substitution rates (Ks) of the syntenic fragments containing LerHMGR genes (Table S3). In the results of the synteny analysis between the genomes of L. erythrorhizon, O. europaea, and S. indicum, LerHMGR5 was collinear with OeuHMGR3, while LerHMGR7 was collinear with SinHMGR1 ( Figure 2B). It suggested that most LerHMGRs were acquired through species-specific duplication events during plant evolution.
The genomic location showed that these 11 LerHMGRs were located on eight different contigs ( Figure 2C). Four members (i.e., LerHMGR2, LerHMGR4, LerHMGR8, and Ler-HMGR9) might be derived from tandem duplications, as indicated by their genomic loci: LerHMGR2, LerHMGR3, and LerHMGR4 were closely located in the contig02634, while LerHMGR8 and LerHMGR9 were closely located in the contig03594 ( Figure 2C, Table 1). To elucidate the expansion mechanisms of the LerHMGR gene family in L. erythrorhizon, we analyzed the genomic location and duplication types of each LerHMGR. Then the synthetic blocks were identified in the entire L. erythrorhizon genome, as well as between it and the O. europaea and S. indicum genomes. We also calculated the median synonymous substitution rates (Ks) of the syntenic fragments containing LerHMGR genes (Table S3). In the results of the synteny analysis between the genomes of L. erythrorhizon, O. europaea, and S. indicum, LerHMGR5 was collinear with OeuHMGR3, while LerHMGR7 was collinear with SinHMGR1 ( Figure 2B). It suggested that most LerHMGRs were acquired through species-specific duplication events during plant evolution.

Gene Name Duplication Types
LerHMGR1 Dispersed duplication LerHMGR8 Tandem duplication LerHMGR9 Tandem duplication LerHMGR10 WGD/Segmental duplication LerHMGR11 WGD/Segmental duplication Furthermore, the Ks distribution was analyzed by calculating the Ks value of 8257 synteny gene pairs based on Tang's published L. erythrorhizon genome data [42] (Table S4). It was found that there were two peaks (Ks = about 0.088 and 0.376), indicating that the L. erythrorhizon genome may have undergone two rounds of genome-wide duplication ( Figure S2). In addition, three syntenic blocks containing LerHMGR genes were identified, as shown in Figure 3D. The syntenic block containing LerHMGR5 and LerHMGR6 had a median Ks of 0.395, which was approximate to the multi-locus peak (~0.376) in the Ks distribution (Tables S3 and S4). It is possible that the duplications arose via the WGD event. The syntenic block containing LerHMGR1 and LerHMGR3 had a median Ks of 0.444, and the syntenic block containing LerHMGR10 and LerHMGR11 had a medianKs of 0.053 (Table S3), implying that the LerHMGR family may have undergone one ancient and one recent segmental duplication. and ERE were identified as high-frequency elements in the promoters of LerHMGRs (Figure S3). Cluster analysis revealed that LerHMGR2 contained the fewest light-responsive elements and total cis-acting elements ( Figure 3). To summarize, these results suggest that transcript levels of LerHMGRs may be regulated by components in light, hormone, and stress response regulatory pathways associated with the accumulation of shikonin in L. erythrorhizon.

Expression Patterns Revealed the Possible Critical Role of LerHMGR1 and LerHMGR2 for the Biosynthesis of Shikonin and Its Derivatives.
Given that the production of secondary metabolites is associated with the tissue-specific expression of related genes [47], and that shikonin accumulates abundantly in the root [48] and M9 in the dark [49], genes involved in the shikonin pathway should be highly expressed under these conditions. We first screened for major LerHMGRs involved in shikonin biosynthesis using publicly available transcriptome data from six tissues (the mature root (MR), periderm of mature root (PD), cortex of mature root (CT), stele of mature

Cis-Acting Elements Revealed the Possible Transcription Regulation of LerHMGRs in L. erythrorhizon
The PlantCARE tool was used to scan the 2000 bp upstream promoter regions of LerHMGR genes for putative cis-acting elements that regulate their expression, and a total of 2019 cis-acting elements of 80 types were found in all LerHMGRs promoters (Table S5). Among them, 17 types of light-responsive elements were identified, including Box 4, the GAmotif, the G-box, and the MYB binding site involved in light responsiveness (MRE), which are consistent with the photoinhibition regulation of shikonin biosynthesis [43] (Figure 3). Meanwhile, 12 hormone-related cis-elements were identified, with the majority of LerHMGR promoters containing responsive binding sites-MeJA-responsive element (CGTCA-motif, TGACG-motif), ethylene-responsive element (ERE), and auxin-responsive element (TGAelement), implying that LerHMGR genes may be regulated by jasmonic acid, ethylene, or auxin, all of which have been shown to promote shikonin biosynthesis [44][45][46] (Figure 3). There are 15 distinct types of stress-related elements, including the anaerobic induction regulatory element (ARE), defense-and stress-responsive element (TC-rich repeats), lowtemperature responsiveness element (LTR), and the MYB binding site involved in droughtinducibility (MBS), and these stress conditions are analogous to those found in shikoninproducing species (Figure 3). Additionally, the presence of numerous growth-related components suggests that LerHMGRs may contribute to the growth of L. erythrorhizon ( Figure 3). Overall, certain elements such as Box 4, STRE, ARE, TGACG-Box, and ERE were identified as high-frequency elements in the promoters of LerHMGRs ( Figure S3). Cluster analysis revealed that LerHMGR2 contained the fewest light-responsive elements and total cis-acting elements ( Figure 3). To summarize, these results suggest that transcript levels of LerHMGRs may be regulated by components in light, hormone, and stress response regulatory pathways associated with the accumulation of shikonin in L. erythrorhizon.

Expression Patterns Revealed the Possible Critical Role of LerHMGR1 and LerHMGR2 for the Biosynthesis of Shikonin and Its Derivatives
Given that the production of secondary metabolites is associated with the tissuespecific expression of related genes [47], and that shikonin accumulates abundantly in the root [48] and M9 in the dark [49], genes involved in the shikonin pathway should be highly expressed under these conditions. We first screened for major LerHMGRs involved in shikonin biosynthesis using publicly available transcriptome data from six tissues (the mature root (MR), periderm of mature root (PD), cortex of mature root (CT), stele of mature root (SE), leaves + stems (ML), and flowers (FL)) of the L. erythrorhizon seedlings and two growth conditions (wild-type hairy root tissue cultures grown in M9 in the dark versus B5 in the light) [22,50]. The results showed most LerHMGRs had expression levels in the mature root, root periderm, and hairy root growing in M9 under darkness, which is the primary site for the accumulation of shikonin and its derivatives, but LerHMGR2 had the highest level of expression, followed by LerHMGR1 ( Figure 4A). The TPM value of LerHMGR2 in the mature root and periderm was 19~3880 times and 8.4~20,058 times higher than that of other genes except LerHMGR1, respectively ( Figure 4A). Additionally, the TPM value of LerHMGR1 in the mature root was 4.43~902.37 times that of other genes, and that of LerHMGR1 in periderm was 2.77~6626 times that of other genes ( Figure 4A). Furthermore, the TPM values of LerHMGR2 and LerHMGR1 in hairy roots under M9 dark culture were 17.18~491.2 times and 5.88~190.34 times those of other genes, respectively ( Figure 4A).
HMGR2 in the mature root and periderm was 19~3880 times and 8.4~20,058 times higher than that of other genes except LerHMGR1, respectively ( Figure 4A). Additionally, the TPM value of LerHMGR1 in the mature root was 4.43~902.37 times that of other genes, and that of LerHMGR1 in periderm was 2.77~6626 times that of other genes ( Figure 4A). Furthermore, the TPM values of LerHMGR2 and LerHMGR1 in hairy roots under M9 dark culture were 17.18~491.2 times and 5.88~190.34 times those of other genes, respectively ( Figure 4A).  and darkness for different times (0 h, 6 h, 1 day, 3 day, and 6 day) was detected using qPCR. The error bars indicate the SDs of three replicates. Asterisks represent significant differences via the Student's t-test analysis (* p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001) as compared with root in (B) and 0 h in (C), respectively. Furthermore, due to the inability to obtain the amplification product for LerHMGR11, the relative expression levels of the other ten LerHMGRs were respectively determined using real-time quantitative PCR (qPCR) in the three tissues (roots, stems, and leaves) of L. erythrorhizon seedlings and L. erythrorhizon callus cells cultured in M9 in the dark at different time points (0 h, 6 h, 1 day, 3 day, and 6 day). The results roughly corroborated the transcriptome data, where LerHMGR2 expression was highly induced in the root compared to other homologs ( Figure 4B), and its relative expression increased 49.85-fold in the M9 dark culture at 6 day ( Figure 4C). LerHMGR1 s relative expression increased 8.21-fold in the M9 dark culture ( Figure 4C). In contrast, other members only increased by a maximum of 4.45-fold at all testing time points. These findings suggest that LerHMGR2 and LerHMGR1 may be the major regulators of shikonin biosynthesis.

LerHMGR1 and LerHMGR2 Are Localized to the Endoplasmic Reticulum
Proteins can perform their biological functions only if they have the correct subcellular localization. To confirm whether LerHMGR1 and LerHMGR2 actually have enzymatic activities to catalyze the conversion of HMG-CoA and NADPH into mevalonate, we first transiently expressed LerHMGR1-eGFP and LerHMGR2-eGFP in tobacco leaf cells to determine the subcellular localization. The green fluorescence of the HMGR-eGFP fusion protein was only detected in the endoplasmic reticulum (ER) labeled with the marker protein mCherry-HDEL, whereas the eGFP protein had no specific localization distribution in tobacco cells, which is consistent with the results demonstrating the insertion of Arabidopsis HMGR localization into the ER [51]. This suggests that LerHMGR1 and LerHMGR2 are predominantly localized at the ER in plant cell ( Figure 5A, Table S2). using real-time quantitative PCR (qPCR) in the three tissues (roots, stems, and leaves) of L. erythrorhizon seedlings and L. erythrorhizon callus cells cultured in M9 in the dark at different time points (0 h, 6 h, 1 day, 3 day, and 6 day). The results roughly corroborated the transcriptome data, where LerHMGR2 expression was highly induced in the root compared to other homologs ( Figure 4B), and its relative expression increased 49.85-fold in the M9 dark culture at 6 day ( Figure 4C). LerHMGR1′s relative expression increased 8.21fold in the M9 dark culture ( Figure 4C). In contrast, other members only increased by a maximum of 4.45-fold at all testing time points. These findings suggest that LerHMGR2 and LerHMGR1 may be the major regulators of shikonin biosynthesis.

LerHMGR1 and LerHMGR2 Are Localized to the Endoplasmic Reticulum
Proteins can perform their biological functions only if they have the correct subcellular localization. To confirm whether LerHMGR1 and LerHMGR2 actually have enzymatic activities to catalyze the conversion of HMG-CoA and NADPH into mevalonate, we first transiently expressed LerHMGR1-eGFP and LerHMGR2-eGFP in tobacco leaf cells to determine the subcellular localization. The green fluorescence of the HMGR-eGFP fusion protein was only detected in the endoplasmic reticulum (ER) labeled with the marker protein mCherry-HDEL, whereas the eGFP protein had no specific localization distribution in tobacco cells, which is consistent with the results demonstrating the insertion of Arabidopsis HMGR localization into the ER [51]. This suggests that LerHMGR1 and Ler-HMGR2 are predominantly localized at the ER in plant cell ( Figure 5A, Table S2).

Functional Identification of LerHMGR1 and LerHMGR2 In Vitro
Subsequently, in order to investigate whether LerHMGR1 and LerHMGR2 are active enzymes, LerHMGR1 and LerHMGR2 were expressed heterologously in the Escherichia coli strain BL21 (DE3). SDS-PAGE and Western blot can both detect two specific fusion proteins with a molecular weight of approximately 70 KDa (LerHMGR1/LerHMGR2:~60.4 KDa, two his-tag:~1.68 KDa, S-tag:~8.1 KDa) ( Figure 5B), indicating that the protein was successfully expressed and isolated from the supernatant of the resultant recombinant E. coli. Furthermore, using purified protein, the enzyme abilities of these two LerHMGRs were tested in vitro. The reaction product was analyzed using UPLC-Triple-TOF-MS/MS, and the special ion current peak or mass fragmentation pattern of mevalonolactone (a mevalonate esterification product) was detected at 1.712 and 1.702 min, and the UPLC/Triple TOF 4600+ m/z values were 131.0703 Da and 131.0705 Da, respectively, which are consistent with the results for mevalonolactone as standard ( Figure 5C and Figure S4). However, no distinct peak of mevalonolactone was observed in the controls, and the UPLC/Triple TOF 4600+ m/z value was different from 131.07 Da ( Figure 5C). The results indicated that both LerHMGR1 and LerHMGR2 had the ability to catalyze the conversion of HMG-CoA and NADPH into MVA.

Discussion
Shikonin extracted from L. erythrorhizon has a wide spectrum of medical and commercial values, and its biosynthesis is jointly regulated by various enzymes that are members of multiple gene families, especially HMGR in the upstream MVA pathway, and PGT in the downstream pathway to link the upstream MVA and phenylpropanoid pathways [22,52]. HMGR, which catalyzes the formation of MVA from HMG-CoA, is the first key rate-limiting enzyme in the metabolic synthesis of shikonin and terpenoids [29,52]. In this study, we successfully identified 11 LerHMGRs from L. erythrorhizon and 113 HMGR family genes from the whole genome sequence of the other 35 representative plants, and their characteristics, phylogenetic relationships, duplications and losses, and expansion events were systematically studied. It complements HMGR gene family analysis in some species recently sequenced [32]. More importantly, through expression patterns analysis and in vitro enzyme activity, we hypothesized that LerHMGR1 and LerHMGR2 play key rate-determining roles in shikonin biosynthesis.
Our analysis revealed that more HMGR family genes were detected in higher plants than in lower plants, particularly in P. tabuliformis, L. erythrorhizon, and O. europaea, which may be due to the large number of transposons in Gymnosperms [37] and the multiple WGD or WGT events in Angiosperms [53], leading to the duplication of more HMGR genes to adapt to the ecological environment. The analysis of LerHMGR's evolutionary status and expansion events demonstrated the importance of WGD, segmental duplication, and tandem duplication in the evolution of the shikonin/alkannin pathway in L. erythrorhizon, which is consistent with other studies of specialized metabolisms in plants [54][55][56].
We identified a total of 11 LerHMGR family members from the L. erythrorhizon genome published by Tang et al. [42], which is contradictory to the number of LerHMGR family members we identified from the genome published by Auber et al. in 2020 [22]. Blast analysis results for two HMGR group sequences showed that the eight LerHMGRs identified from the genome by Auber et al. belong to the eleven LerHMGR family members identified from the genome by Tang et al. (Table S6). Additionally, ten LerHMGRs sequences were further confirmed via amplification using specific primers. The 433 bp sequence fragment of HMGR previously cloned from L. erythrorhizon [36] is part of LerHMGR1 identified in this study. However, LerHMGR11 could not be amplified using specific primers, and its expression levels in L. erythrorhizon roots, stems, and leaves, as well as hairy roots in a B5 light culture and M9 dark culture, were not detected when analyzed by qPCR, which may be due to the low expression level of LerHMGR11.
HMGR is mostly found in plants as a gene family [55] and its importance may stem from the fact that it can maintain beneficial metabolite production, that their functional redundancy allows plants to survive adversity or stress, or that each member is in charge of a different final compound synthesis [57]. We investigated the expression patterns of LerHMGRs in the tissues and their responses to M9 medium in the darkness (Figure 4). With the exception of LerHMGR2, which showed a significant increase in hairy roots/callus cultured in M9 in the darkness compared to B5 in the light, as well as a root-specific expression where shikonin was biosynthesized, LerHMGR1 also showed an up-regulated expression in the M9 dark culture and root, while the expression levels and induction amplitude of other homologous genes were lower or even had no difference in the M9 dark culture. These findings suggest that LerHMGR2 and LerHMGR1 may play a major and redundant role in shikonin biosynthesis. Additionally, we noticed that other LerHMGRs, such as LerHMGR3~LerHMGR10, have an obvious level of expression in leaves, stems, or roots. This implies that the LerHMGRs members may have distinct roles in plant growth. For example, LerHMGR6 is mainly highly expressed in flowers, which may be related to the synthesis of volatile terpenoids or pigment lipid-soluble terpenoids in flowers [58]. LerHMGR7, on the other hand, is highly expressed in leaves and may be involved in the accumulation of terpenoids. Similar to the expression levels of the TmHMGR gene in Taxus media which was generally consistent among leaves, roots, and stems, LerHMGR10 may also be constitutionally expressed and involved in regulating basic physiological metabolism [59].
LerHMGR1 and LerHMGR2 are the only members of the LerHMGR gene family that are highly specifically expressed in both the root and M9 dark culture, and their catalytic functions of MVA biosynthesis have been demonstrated by enzyme activity experiments in vitro. Regrettably, we did not clarify the contribution ratio of LerHMGR1 and LerHMGR2 to the accumulation of shikonin. This issue needs to be explored further by comparing the kinetic properties of LerHMGR1 and LerHMGR2, as well as developing transgenic hairy roots with specific gene targets using the Crispr/cas9 approach and overexpression techniques. Furthermore, while the induced expression of all LerHMGRs except LerHMGR1 and LerHMGR2 was very low, the expression of most genes in B5 to M9 followed a timedependent pattern of first increasing, then decreasing, and then increasing, which was attributed to the stimulating effect during medium conversion. However, the expression of LerHMGR2 increased continuously throughout time, suggesting that LerHMGR2 may play a significant role in the biosynthesis of shikonin in response to various environmental and developmental factors.
The molecular mechanism of the post-translational regulation of HMGR in Arabidopsis has been extensively studied: the activity of HMGR1S and HMGR1L is negatively regulated by the phosphorylation of two B-type regulatory subunits of Ser/Thr protein phosphatase 2A (PP2A) phosphatase: B α and B β [60,61]. However, research on the transcriptional regulation of HMGR in higher plants is currently restricted. At the transcriptional level, the expression of HMGR is regulated by proteins that bind to the sterol regulatory elements of the HMGR promoter [62]. To further explore the transcription factors that may regulate the HMGR transcript level, we first identified 2419 candidate transcription factors from the L. erythrorhizon genome (Table S7), and then performed co-expression network analysis using all transcription factors, LerHMGR1 and LerHMGR2, based on transcriptomes data (Table S8). As shown in Figure S5, LE19274.1, LE10602.1, and LE00798.1, belonging to C2H2, and LE28791.1, belonging to AP2/ERF-ERF, had significant positive correlations with LerHMGR1 and LerHMGR2 (r > 0.65, p < 0.05), suggesting that they may be transcription factors that positively regulate the HMGRs.
In conclusion, we thoroughly investigated the HMGR gene family in 36 representative plants. Exploring the evolutionary history of the gene family and identifying the key school genes involved are a necessary basis for improving the shikonin biosynthesis pathway and increasing shikonin production through genetic engineering.

Plant Materials and Treatment
Lithospermum erythrorhizon Sieb. et Zucc seeds collected in Inner Mongolia, China, were allowed to germinate on moist sand in the dark at 4 • C for about one month. Plants were grown on soil in growth chambers at 25 • C with a 16 h day/8 h night photoperiod, 100 µE·m −2 ·s −1 light intensity, and 60-70% relative humidity. The L. erythrorhizon suspension culture cell lines were made from the radicle of L. erythrorhizon. They were grown in B5 medium in the light and then shikonin production was induced in M9 medium in the darkness.

Identification of HMGR Family Genes
The Pfam domain PF00368 is found in Pfam protein family databases (http://pfam. xfam.org/, accessed on 25 April 2023) and HMMER 3.0 (https://github.com/PhyreEngine/ conda-hmmer, accessed on 25 April 2023) was used to search the genomes of 36 plants for HMGR genes. The download addresses for genome files of all species are listed in Table  S1. Then redundant sequences and abnormal sequences (incomplete PF00368 domain) identified by Batch CD-Search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb. cgi, accessed on 25 April 2023) were removed. Eleven gene sequences of the HMGR family in L. erythrorhizon genome were obtained, then used to blast against the eight LerHMGRs from the genome published by Auber et al. using all-to-all blastp, and primers (Table  S9) were designed for amplification with cDNA mixtures of seedling leaves and roots as template to verify all sequences in L. erythrorhizon using Phanta Max Master Mix (Vazyme, #P515, Nanjing, China).

Bioinformatics Analysis
For HMGR phylogenetic tree construction, the amino acid sequences of the identified HMGR family members were aligned via MAFFT v7.310 [63]. Subsequently, the preliminary alignment was trimmed using trimAL v.1.2.rev59 (key parameter: −gt 0.50) [64]. The trimmed alignment was used to construct the phylogenetic tree using IQ-TREE multicore version 2.0.3 and 1000 bootstrap replications [65]. The conserved motif analysis was performed in the MEME program (https://meme-suite.org/meme/tools/meme, accessed on 25 April 2023) using full-length amino acid sequences, as the default setting was 10 for the motif number. The number of introns and exons was analyzed using the Gene Structure Display Server 2.0 (http://gsds.gao-lab.org/, accessed on 25 April 2023). The cis-acting elements of the promoter region (2000 bp sequence upstream of the gene) of the LerHMGR genes were analyzed using the PlantCARE program (http://bioinformatics.psb.ugent.be/ webtools/plantcare/html/, accessed on 25 April 2023). The physicochemical properties, including theoretical Mw, theoretical pI, aliphatic index, and grand average of hydropathicity of HMGRs, were analyzed by Ex-PaSy (http://web.expasy.org/protparam/, accessed on 25 April 2023). The prediction of transmembrane helices in HMGR was performed using the TMHMM Server v.2.0 (https://services.healthtech.dtu.dk/services/TMHMM-2.0/, accessed on 25 April 2023). Gene synteny and duplication types of HMGRs were analyzed using the 'MCScanX' and 'duplicate_gene_classifier' programs implemented in the MC-ScanX package [66]. The Ks of gene pairs were calculated with TBtools software (V1.098696) using the Simple Ka/Ks calculator [67]. For the Ks frequency distribution, MCScanX [66] was used to identify synthetic blocks within the genome of L. erythrorhizon and extract information on synteny gene pairs. Then, the wgd package (https://github.com/arzwa/wgd, accessed on 25 April 2023) was used to calculate the Ks values between synteny gene pairs, and ggplot2 (https://github.com/tidyverse/ggplot2, accessed on 25 April 2023) was used to plot histograms and fit curves (Ks values less than 0.05 and greater than 5 were filtered out).

Duplication/Loss Detection of HMGR Gene Family
The phylogenetic trees of 21 species and their HMGRs were input into the Notung-2.9.1.5 software [68] for the detection of duplications and loss of HMGR family genes, respectively. The phylogenetic trees of 21 species were made using the iTOL (https:// itol.embl.de/, accessed on 25 April 2023) according to the relationship of species in NCBI taxonomy [69].

RNA-Seq Experiments
RNA-seq data from L. erythrorhizon whole root (MR), root periderm (PD), root cortex (CT), root stele (SE), leaves + stems (ML), flowers (FL), and hairy roots grown in M9 in the dark and B5 in the light were downloaded and used for gene expression analysis from the NCBI SRA (accession ID: SRP141330, SAMN13650849, SAMN13650867) [22,50]. The expression level of each gene was calculated using RNA sequencing quantification analysis with the transcripts per kilobase of the exon model per million mapped reads method by TPMCalculator [70]. A log2(TPM + 1) value was used to quantify the expression level of each LerHMGR gene, and a clustering heatmap was drawn in R (package: pheatmap).

RNA Extraction and RT-qPCR Analysis
Total RNA was extracted from cultured cells, hairy roots, and different tissues of L. erythrorhizon using the FastPure Plant Total RNA Isolation Kit (Vazyme, #RC401, Nanjing, China). The RNA purity and integrity were assessed based on the A260/A280 absorbance ratio and 1.0% agarose gel electrophoresis. cDNA was synthesized by reverse transcription with the HiScript III 1st Strand cDNA Synthesis Kit (+gDNA wiper) (Vazyme, #R312), and qPCR was performed using the ChamQ Universal SYBR qPCR Master Mix (Vazyme, #Q711) with gene-specific primers (Table S9) on an Applied Biosystems 7500 Real-Time PCR System and StepOnePlus™ Real-Time PCR System. Gene expression levels of each sample were normalized relative to GAPDH mRNA as an internal standard and calculated using the 2 −∆∆Ct method [71]. At least three independent experiments were performed for each analysis.

Heterologous Expression of LerHMGR1 and LerHMGR2 in Escherichia coli
The coding sequence (CDS) of LerHMGR1 and LerHMGR2 were subcloned into the BamH I/Hind III sites of the prokaryotic expression vector pET-30a (+) and transformed into E. coli BL21(DE3) using homologous recombination (Vazyme, #C115). After growing the transformants to an OD 600 of 0.6 in 400 mL of lysogeny broth medium at 37 • C, protein expression was induced by adding 0.5 mM of isopropyl β-D-1-thiogalactopyranoside, followed by 20 h of cultivation at 25 • C. The primers used for plasmid construction are listed in Table S9.
After centrifugation, the bacteria were resuspended in 32 mL of 1 × PBS and 1 mM PMSF was added, and ultrasonic crushing was performed for 30 min. Then the bacteria debris was removed by centrifugation at 10,000 g for 20 min. The supernatant was applied to an affinity column filled with 3 mL of Ni-NTA Agarose Resin (Yeasen, #20502ES10, Shanghai, China), and the protein was purified following the manufacturer's instructions. Purified fusion proteins were eluted in an elution buffer containing 50 mM NaH 2 PO 4 (pH 8.0), 300 mM NaCl, and 250 mM imidazole. They were then used for 10% SDS-PAGE and Western blotting was performed with anti-his mouse monoclonal antibody as the primary antibody (TransGen Biotech, #HT501, Beijing, China) and goat anti-mouse IgG, HRP conjugate (TransGen Biotech, #HS201) as the secondary antibody.

In Vitro Enzyme Activity Assay
The enzymatic assays for LerHMGRs were carried out in the manner described by Wilding et al. [72], with minor modifications. A total of 10 µg His-tag purified protein was added to 1 mL of assay buffer (25 mM K 2 HPO 4 , 50 mM KCl, 1 mM EDTA, 5 mM DTT, 0.3 mM NADPH, 0.3 mM HMG-CoA, and pH 7.5) and the reaction system was incubated at 30 • C for 30 min. To stop the enzymatic reaction, 100 µL HCl (6 M) was added to each reaction system and the products were lactonized for 1 h at 25 • C. The products were extracted in 2 mL of ethyl acetate, which was then evaporated and resuspended in 200 µL ethanol absolute. The final products were analyzed using a UPLC-Triple-TOF-MS/MS system (AB SCIEX, Framingham, MA, USA) with a Welch Ultimate XB-C18 column and separation conditions. A DL-Mevalolactone was purchased from Shanghai Yuanye Bio-Tec (Shanghai, China) as the standard. Additionally, a negative control with sterilized water in place of the target LerHMGR protein was added to the reaction mixture for identification.

Subcellular Localization Analysis
The subcellular localization of LerHMGRs was predicted in the endoplasmic reticulum by ProtComp v. 9.0 (http://linux1.softberry.com/berry.phtml?topic=protcomppl& group=programs&subgroup=proloc, accessed on 25 April 2023). Then, using homologous recombination, the CDS sequences of LerHMGR1 and LerHMGR2 were subcloned into the Xba I/BamH I sites of vector pBI121::eGFP to generate pBI121::LerHMGR1-eGFP and pBI121::LerHMGR2-eGFP. A plasmid expressing the ER-mCherry-HDEL protein was used as an endoplasmic reticulum localization marker. At 1:1, A. tumefaciens strain GV3101 containing the fusion expression vector (OD 600 = 1.5) and the same strain containing the ER-mCherry-HDEL vector (OD 600 = 1.5) were simultaneously injected into the leaves of 4-week-old Nicotiana tabacum plants. The fluorescence signal was observed after 72 h using an LSM 880 (Zeiss, Oberkohen, Germany) confocal microscope equipped with an AxioObserver. LerHMGRs-GFP fusion proteins and free GFP were observed in the excitation wavelengths of 488 nm, and mCherry-HDEL was observed in the excitation wavelengths of 561 nm as an ER marker using a Zeiss LSM880 (confocal laser scanning microscope, CLSM). The vector pBI121:: eGFP was stored in our laboratory and the ER-mCherry-HDEL vector was purchased from the MiaoLing Plasmid Platform (http://www.miaolingbio.com/, accessed on 25 April 2023). The primers used for plasmid construction are listed in Table S9.

Co-Expression Network of Transcription Factors and LerHMGR1 and LerHMGR2 Genes
Transcription factors in L. erythrorhizon were identified by using the ITAK (http:// itak.feilab.net/cgi-bin/itak/index.cgi, accessed on 25 April 2023). The expression patterns of 69 types of transcription factors, as well as LerHMGR1 and LerHMGR2, were used for performing the co-expression analysis in the R software using the corr.test () function with Pearson's method. The co-expression network was visualized using Cytoscape_v3.9.1 [73].

Conflicts of Interest:
The authors declare no conflict of interest.